{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "# 文件的应用"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## csv文件 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CSV是文本文档，所以对文本进行读写的方法都适用于CSV格式文件的数据处理。而CSV格式的文件里的数据基本上都是由行和列构成的二维数据，可以使用列表嵌套的方法对其进行处理。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## 实例 9.5 读文件处理文件中的数据 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "文件<a href=\"https://data.educoder.net/api/attachments/3464254?type=office&disposition=inline\" target=\"_blank\">5.7 score.csv</a>  中数据如下，读取其中的数据并进行处理。"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "pycharm": {
     "name": "#%% raw\n"
    }
   },
   "source": [
    "姓名,C语言,Java,Python,C#,C++\n",
    "罗明,95,96,85,63,91\n",
    "朱佳,75,93,66,85,88\n",
    "李思,86,76,96,93,67\n",
    "郑君,88,98,76,90,89\n",
    "王雪,99,96,91,88,86"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "按要求完成以下操作：\n",
    "1. 请读取文件中的数据，每行数据根据逗号切分为列表，输出二维列表。\n",
    "2. 计算每位同学的总分附加到列表中各门课程成绩后面\n",
    "3. 根据每名同学的总分进行降序排序输出\n",
    "4. 将排序后的结果写入到新文件 “9.2 score_total.csv” 中"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# 读文件中的数据为二维列表\n",
    "\n",
    "score_ls = []        # 创建空列表\n",
    "with open('/data/bigfiles/5,7 score.csv','r',encoding='utf-8') as fr:\n",
    "    for line in fr:  # 遍历文件对象\n",
    "    \tscore_ls.append(line.strip().split(','))  # 文件每行切分为一个列表，split()参数根据文件中的分隔符确定\n",
    "print(score_ls)  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "定义为函数，用列表推导式实现更简洁，类似问题只修改文件名和切分时使用的分隔符即可重用此代码。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "def read_file(filename):\n",
    "    \"\"\"接收文件名为参数，读取文件中的数据为二维列表，返回二维列表\"\"\"\n",
    "    with open(filename, 'r', encoding='utf-8') as fr:\n",
    "        score_ls = [x.strip().split(',') for x in fr]\n",
    "    return score_ls\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    score_lst = read_file('/data/bigfiles/5,7 score.csv')\n",
    "    print(score_lst)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为每位同学增加总成绩，可以将列表序号为0的元素score[0]拼接上['总分']，对剩余部分列表 score[1:] 进行遍历，每次得到一位同学的列表 x ，切片 x[1:] 为各门课的成绩，用map()函数将每个元素都 映射为整数，再用 sum() 函数对其进行求和并附加到列表中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_file(filename):\n",
    "    \"\"\"接收文件名为参数，读取文件中的数据为二维列表，返回二维列表\"\"\"\n",
    "    with open(filename, 'r', encoding='utf-8') as fr:\n",
    "        score_ls = [x.strip().split(',') for x in fr]\n",
    "    return score_ls\n",
    "\n",
    "\n",
    "def add_total(score_ls):\n",
    "    \"\"\"接收二维列表，增加总分，返回新的二维列表\"\"\"\n",
    "    title = score_ls[0] + ['总分']  # 列表拼接，一维列表\n",
    "    score_total = [title]          # 创建新列表，首元素为原列表标题行增加平均成绩\n",
    "    for x in score_ls[1:]:         # 略过标题行进行遍历\n",
    "        total = sum(map(int, x[1:]))          # 切取成绩部分映射为整数，计算总成绩\n",
    "        score_total.append(x + [str(total)])  # 每个列表拼接字符型平均成绩\n",
    "    return score_total\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    score_lst = read_file('/data/bigfiles/5,7 score.csv')\n",
    "    print(add_total(score_lst))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将增加了总成绩的列表数据写入到文件中"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "def read_file(filename):\n",
    "    \"\"\"接收文件名为参数，读取文件中的数据为二维列表，返回二维列表\"\"\"\n",
    "    with open(filename, 'r', encoding='utf-8') as fr:\n",
    "        score_ls = [x.strip().split(',') for x in fr]\n",
    "    return score_ls\n",
    "\n",
    "\n",
    "def add_total(score_ls):\n",
    "    \"\"\"接收二维列表，增加总分，返回新的二维列表\"\"\"\n",
    "    title = score_ls[0] + ['总分']  # 列表拼接，一维列表\n",
    "    score_total = [title]          # 创建新列表，首元素为原列表标题行增加平均成绩\n",
    "    for x in score_ls[1:]:         # 略过标题行进行遍历\n",
    "        total = sum(map(int, x[1:]))          # 切取成绩部分映射为整数，计算总成绩\n",
    "        score_total.append(x + [str(total)])  # 每个列表拼接字符型平均成绩\n",
    "    return score_total\n",
    "\n",
    "\n",
    "def write_file(score_total, filename):\n",
    "    \"\"\"接收成绩二维列表和写入文件名的字符串，\n",
    "    将二维列表中的数据分行写入文件，每个列表中的数据写为一行，用逗号分隔\n",
    "    无返回值\"\"\"\n",
    "    with open(filename, 'w', encoding='utf-8') as fw:  # 写模式打开文件\n",
    "        for line in score_total:                       # 遍历二维列表\n",
    "            fw.write(','.join(line) + '\\n')   # 子列表元素用逗号拼接为字符串加换行符\n",
    "            \n",
    "\n",
    "if __name__ == '__main__':\n",
    "    score_lst = read_file('/data/bigfiles/5,7 score.csv')\n",
    "    score_total_ls = add_total(score_lst)\n",
    "    write_file(score_total_ls, 'score_total.csv')\n",
    "    print(read_file('score_total.csv'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "<font face='楷体' color='red' size=5> 练一练 </font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "修改实例9.4的程序，增加一个函数，可以获得文件中所有同学的名字，参考这个代码，获取并输出python课程的成绩的列表。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "def read_file(filename):\n",
    "    \"\"\"接收文件名为参数，读取文件中的数据为二维列表，返回二维列表\"\"\"\n",
    "    with open(filename, 'r', encoding='utf-8') as fr:\n",
    "        score_ls = [x.strip().split(',') for x in fr]\n",
    "    return score_ls\n",
    "\n",
    "\n",
    "def get_name(score_ls):\n",
    "    \"\"\"接收二维列表，返回全部学生名字的列表\"\"\"\n",
    "    return [x[0] for x in score_ls[1:]]\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    score_lst = read_file('/data/bigfiles/5,7 score.csv')\n",
    "    print(get_name(score_lst))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "<font face='楷体' color='red' size=5> 练一练 </font>"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "pycharm": {
     "name": "#%% raw\n"
    }
   },
   "source": [
    "修改上面的程序，增加一个函数，可以获得文件中所有同学的名字，参考这个代码，获取并输出python课程的成绩的列表。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# 获取并输出python课程的成绩的列表\n",
    "def read_file(filename):\n",
    "    \"\"\"接收文件名为参数，读取文件中的数据为二维列表，返回二维列表\"\"\"\n",
    "    with open(filename, 'r', encoding='utf-8') as fr:\n",
    "        score_ls = [x.strip().split(',') for x in fr]\n",
    "    return score_ls\n",
    "\n",
    "\n",
    "def get_python(score_ls):\n",
    "    \"\"\"接收二维列表，返回全部学生python成绩的列表。\n",
    "    列表中成绩应为整数\n",
    "    \"\"\"\n",
    "    # 补充你的代码\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    score_lst = read_file('/data/bigfiles/5,7 score.csv')\n",
    "    print(get_python(score_lst))"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "pycharm": {
     "name": "#%% raw\n"
    }
   },
   "source": [
    "输出：\n",
    "[85, 66, 96, 76, 91]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# 获取并输出python课程的成绩的列表\n",
    "def read_file(filename):\n",
    "    \"\"\"接收文件名为参数，读取文件中的数据为二维列表，返回二维列表\"\"\"\n",
    "    with open(filename, 'r', encoding='utf-8') as fr:\n",
    "        score_ls = [x.strip().split(',') for x in fr]\n",
    "    return score_ls\n",
    "\n",
    "\n",
    "def get_python_avg(score_ls):\n",
    "    \"\"\"接收二维列表，返回全部学生python的平均成绩\"\"\"\n",
    "    # 补充你的代码\n",
    "    python = [int(x[3]) for x in score_ls[1:]]\n",
    "    return sum(python) / len(python)\n",
    "\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    score_lst = read_file('/data/bigfiles/5,7 score.csv')\n",
    "    print(get_python_avg(score_lst))"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "pycharm": {
     "name": "#%% raw\n"
    }
   },
   "source": [
    "82.8"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "<font face='楷体' color='red' size=5> 练一练 </font>"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "pycharm": {
     "name": "#%% raw\n"
    }
   },
   "source": [
    "修改实例9.4的程序，获得并输出转置的列表"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "pycharm": {
     "name": "#%% raw\n"
    }
   },
   "source": [
    "提示：\n",
    "list(zip(*score_ls)) 可将二维列表中的数据行列转置，得到元素为元组的列表"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "pycharm": {
     "name": "#%% raw\n"
    }
   },
   "source": [
    "[['姓名', 'C语言', 'Java', 'Python', 'C#', 'C++'], \n",
    " ['罗明', '95', '96', '85', '63', '91'], \n",
    " ['朱佳', '75', '93', '66', '85', '88'], \n",
    " ['李思', '86', '76', '96', '93', '67'], \n",
    " ['郑君', '88', '98', '76', '90', '89'], \n",
    " ['王雪', '99', '96', '91', '88', '86']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "转为"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "pycharm": {
     "name": "#%% raw\n"
    }
   },
   "source": [
    "[('姓名', '罗明', '朱佳', '李思', '郑君', '王雪'), \n",
    "('C语言', '95', '75', '86', '88', '99'), \n",
    "('Java', '96', '93', '76', '98', '96'), \n",
    "('Python', '85', '66', '96', '76', '91'), \n",
    "('C#', '63', '85', '93', '90', '88'), \n",
    "('C++', '91', '88', '67', '89', '86')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "def read_file(filename):\n",
    "    \"\"\"接收文件名为参数，读取文件中的数据为二维列表，返回行列转置后的二维列表\"\"\"\n",
    "    # 补充你的代码\n",
    "\n",
    "    \n",
    "    \n",
    "\n",
    "if __name__ == '__main__':\n",
    "    score_lst = read_file('/data/bigfiles/5,7 score.csv')\n",
    "    print(score_lst)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font face='楷体' color='red' size=5> 练一练 </font>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "附件文件<a href=\"https://data.educoder.net/api/attachments/3464255?type=office&disposition=inline\" target=\"_blank\">成绩分析综合.csv</a> 中记录了学生实验课的成绩，其中8次实验占总成绩的70%，作业成绩占总成绩的30%，请完成以下任务：\n",
    "1. 计算每位同学的总成绩\n",
    "2. 在平时成绩后增加总成绩，按总成绩降序排序，总成绩相同时按学号降序排序，写到文件“成绩分析汇总.csv”中\n"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "pycharm": {
     "name": "#%% raw\n"
    }
   },
   "source": [
    "姓名,学号,实验一,实验二,实验三,实验四,实验五,实验六,实验七,实验八,作业成绩\n",
    "叶灿,0121701100312,100,100,100,100,100,100,60,100,96\n",
    "朱宇轩,0121701100702,100,100,100,100,100,100,90,100,99\n",
    "陈一帆,0121701100730,100,100,100,100,50,100,80,75,89.25\n",
    "王璐,0121701100733,100,100,100,100,100,100,100,100,100\n",
    "......"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "# 此代码为读csv文件到列表中，参考此代码完成题目\n",
    "def read_txt(filename):\n",
    "    \"\"\"读文件到二维列表\"\"\"\n",
    "    with open(filename, encoding='utf-8') as f:\n",
    "        return  [x.strip().split(',') for x in f]\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    score = read_txt('/data/bigfiles/成绩分析综合.csv')\n",
    "    print(score)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "def read_csv(filename):\n",
    "    \"\"\"读文件到二维列表，精简代码\"\"\"\n",
    "    # 补充你的代码\n",
    "    \n",
    "\n",
    "\n",
    "def add_total(score_ls):\n",
    "    \"\"\"接收包含成绩的二维列表，标题行增加“总成绩”列，计算总成绩并附加到列表中，\n",
    "    返回增加了总成绩的二维列表。\n",
    "    \"\"\"\n",
    "    # 补充你的代码\n",
    "    \n",
    "\n",
    "\n",
    "def write_file(total_score, filename):\n",
    "    \"\"\"接收成绩二维列表和写入文件名的字符串，将二维列表中的数据分行写入文件，\n",
    "    每个列表中的数据写为一行，用逗号分隔，无返回值。\n",
    "    \"\"\"\n",
    "    # 补充你的代码\n",
    "\n",
    "    \n",
    "            \n",
    "if __name__ == '__main__':\n",
    "    score_lst = read_csv('/data/bigfiles/成绩分析综合.csv')  # 读文件到列表\n",
    "    score_all = add_total(score_lst)  \n",
    "    write_file(score_all, '成绩分析综合2.csv')  # 写入到'成绩分析综合2.csv'\n",
    "    print(read_csv('成绩分析综合2.csv'))  # 再次调用读文件的函数，参数为新写入的文件名，查看是否写入成功\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%% md\n"
    }
   },
   "source": [
    "## JSON文件 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "JSON的编码过程是将一个Python对象转为JSON格式数据，  \n",
    "主要使用json.dumps(obj)和json.dump(obj,fp)两个方法。"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "source": [
    "json.dumps(obj, *, ensure_ascii=True, indent=None, sort_keys=False)\n",
    "json.dump(obj, fp, *, ensure_ascii=True, indent=None, sort_keys=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| 方法 | 描述 |\n",
    "| :---- | :---- |\n",
    "| json.dumps(obj) | 将Python 数据类型obj对象编码成JSON数据，写入内存。 |\n",
    "| json.dump(obj,fp) | 将Python 数据类型obj对象编码成JSON数据，写入磁盘文件对象fp中。 |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "json 中默认ensure_ascii=True， 会将中文等非ASCII 字符转为unicode 编码（形如\\uXXXX），设置ensure_ascii=False 可以禁止JSON 将中文转为unicode 编码，保持中文原样输出。  \n",
    "Python中的字典数据转为JSON 默认不排序。可设置sort_keys=True使转换结果按照字典升序排序（a-z）。  \n",
    "indent 参数可用来对JSON 数据进行格式化输出，默认值为None，不做格式化处理，可设一个大于0 的整数表示缩进量，例如indent=4。输出的数据被格式化之后，变得可读性更好。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "JSON的**解码**过程是将JSON格式数据转为Python对象,Python 的原始类型与JSON类型会相互转换。  \n",
    "主要使用json.loads(s)和json.load(fp)两个方法。\n"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "json.load(fp)\n",
    "json.loads(s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| 方法 | 描述 |\n",
    "| :---- | :---- |\n",
    "| json.loads(s) | 将字符串s中的JSON数据解码为Python 数据类型，其他格式数据会转为unicode格式。 |\n",
    "| json.load(fp) | 将磁盘文件对象fp中的JSON数据解码为Python 数据类型，其他格式数据会转为unicode格式。 |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "将csv文件\"成绩分析综合.csv\"转为json文件"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "1. 将文件读为二维列表\n",
    "2. 遍历二维列表序号1以后的元素\n",
    "    2.1 用zip()将二维列表序号为 0 的子列表分别与后续的子列表组合并用dict()转为字典类型\n",
    "3. 用dumps()将字典转为json格式数据写入json文件\n"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "读\"成绩分析综合.json\"为列表"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "1. 读文件创建文件对象fp\n",
    "    1.1 用json.load(fp)将文件内中的数据读为python数据类型\n",
    "2. 返回读取的结果"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "\n",
    "def read_csv(filename):\n",
    "    \"\"\"读文件到二维列表，返回二维列表\"\"\"\n",
    "    with open(filename, \"r\", encoding='utf-8') as f:\n",
    "        return [x.strip().split(',') for x in f]\n",
    "\n",
    "\n",
    "def to_json(score_ls):\n",
    "    \"\"\"将二维列表标题行的元素与子列表中的元素一一组合，\n",
    "    禁止JSON 将中文转为unicode 编码，保持中文原样输出。\n",
    "    对JSON 数据进行格式化输出，缩进4个字符，\n",
    "    返回json格式数据\"\"\"\n",
    "    score_dic = []\n",
    "    for i in range(1, len(score_ls)):\n",
    "        score_dic.append(dict(zip(score_ls[0], score_ls[i])))\n",
    "    return json.dumps(score_dic, ensure_ascii=False, indent=4)\n",
    "\n",
    "\n",
    "def write_to_json(score_ls):\n",
    "    \"\"\"将二维列表标题行的元素与子列表中的元素一一组合，\n",
    "    禁止JSON 将中文转为unicode 编码，保持中文原样输出。\n",
    "    对JSON 数据进行格式化输出，缩进4个字符，\n",
    "    写入json格式文件\"\"\"\n",
    "    score_dic = []\n",
    "    for i in range(1, len(score_ls)):\n",
    "        score_dic.append(dict(zip(score_ls[0], score_ls[i])))\n",
    "    with open('成绩分析综合.json','w',encoding='utf-8') as score_json:\n",
    "        json.dump(score_dic,score_json, ensure_ascii=False, indent=4)\n",
    "\n",
    "\n",
    "def read_json_to_ls(filename):\n",
    "    \"\"\"读json文件，返回元素为字典类型的列表\"\"\"\n",
    "    with open(filename, \"r\", encoding='utf8') as f:\n",
    "        score_dic_ls = json.load(f)\n",
    "    return score_dic_ls\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    score_lst = read_csv('/data/bigfiles/成绩分析综合.csv')\n",
    "    # print(to_json(score_lst))\n",
    "    write_to_json(score_lst)\n",
    "    print(read_json_to_ls('成绩分析综合.json'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "def read_json(filename):\n",
    "    \"\"\"读json文件，以字符串形式输出，查看写入文件的格式\"\"\"\n",
    "    with open(filename, \"r\", encoding='utf8') as f:\n",
    "        print(f.read())\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    read_json('成绩分析综合.json')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 借用eval()实现json转csv\n",
    "eval()可以将字符串转为python数据类型，所以可以先用read()将json文件内容读取为字符串，再用eval()转为列表，再转csv文件"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open('成绩分析综合.json',encoding='utf-8') as fr:\n",
    "    txt = fr.read()                          # txt为字符串\n",
    "\n",
    "txt_py = eval(txt)                           # txt_py为元素为字典的列表\n",
    "title = list(txt_py[0].keys())               # title为标题列表\n",
    "score = [list(i.values()) for i in txt_py]   # score为值的列表\n",
    "score_ls = [title]+score                     # score_ls为标题+值的列表\n",
    "\n",
    "with open('成绩分析综合.csv','w',encoding='utf-8') as fw:\n",
    "    for lst in score_ls:                     # 遍历列表\n",
    "        fw.write(','.join(lst)+'\\n')          # 将列表用逗号拼接为字符串并写入文件"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## os库常用方法 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "| 方法 | 描述 |\n",
    "| :---- | :---- |\n",
    "| os.getcwd() | 获取当前工作路径 |\n",
    "| os.chdir(path)    | 将当前工作路径修改为path，如os.chdir(r'c:\\\\Users') |\n",
    "| os.walk(path) | 返回类型为生成器，包含数据为若干包含文件和文件夹名的元组数据 |\n",
    "| os.listdir(path) | 以列表形式返回path路径下的所有文件名，不包括子路径中的文件名 |\n",
    "| os.mkdir(pathname) | 新建一个名为pathname的文件夹 |\n",
    "| os.rmdir(pathname)    | 删除空文件夹pathname，文件夹不为空则报OSError错误 |\n",
    "| os.remove(filename) | 删除文件  filename，文件不存在则报错 |\n",
    "| os.path.isfile(path) | 返回path是否是常规文件，是返回True，否则返回False |\n",
    "| os.path.isdir(path) | 返回path是否是目录，是返回True，否则返回False |\n",
    "| os.path.getsize(file)    | 文件file存在，返回其大小(byte为单位)，不存在则报错 |\n",
    "| os.path.exist(name)    | 判断name文件夹或文件是否存在，存在返回True，否则返回False |\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "# getcwd() 获取当前工作目录\n",
    "\n",
    "import os\n",
    "\n",
    "result = os.getcwd() \n",
    "print(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "\n",
    "result = os.listdir('/data/workspace/')\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 实例9.6 批量改文件名 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font face='楷体' color='red' size=5> 重要：请运行下面单元的代码以获取实验数据文件！！！ </font>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "!tar -xvf /data/bigfiles/1cd06ad9-cef1-4267-b6fc-fbce214a089f.tar"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "\n",
    "def read_to_dic():\n",
    "    \"\"\"读股票代码文件，返回以股票代码为键，以股票名为值的字典\"\"\"\n",
    "    stock_dic = {}\n",
    "    with open('./stock_data/stock_basic.csv', 'r', encoding='utf-8') as stock:\n",
    "        stock.readline()  # 读取标题行不用，跳过\n",
    "        for line in stock:\n",
    "            code, x, stock_name = line.strip().split(',')[:3]  # x 数据无用，跳过\n",
    "            stock_dic[code] = stock_name                       # 股票代码为键，股票名为值\n",
    "    # print(stock_dic)\n",
    "    return stock_dic\n",
    "\n",
    "\n",
    "def re_name(file_ls):\n",
    "    \"\"\"遍历列表中文件名，修改对应文件名为股票代码对应的股票名\"\"\"\n",
    "    stock = {'SZ': '_深圳', 'SH': '_上海', 'BJ': '_北京'}  # 构建交易市场字典\n",
    "    for stock_code in file_ls:  #\n",
    "        stock_name = stock_dict.get(stock_code[:-4]) + stock.get(stock_code[-6:-4])\n",
    "        print(stock_name)\n",
    "        os.rename(path + stock_code, new_path + stock_name + '.csv')\n",
    "\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    path = './stock_data/old/'  # 当前路径下的stock文件夹中有类似600000.csv的多个文件\n",
    "    new_path = './stock_data/new/'  # 当前路径下的空文件夹\n",
    "    file_list = os.listdir(path)  # 获取文件夹中的文件名列表\n",
    "    print(file_list)  # ['600000.csv', '600004.csv', '600006.csv',...'沪市股票top300.csv']\n",
    "    stock_dict = read_to_dic()  # 获取代码与股票名的字典\n",
    "    re_name(file_list)\n",
    "    file_list = os.listdir(new_path)  # 获取new_path文件夹中的文件名列表\n",
    "    print(file_list)  # ['600000.csv', '600004.csv', '600006.csv',...'沪市股票top300.csv']\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "new_path = './stock_data/new/'  # 当前路径下的空文件夹\n",
    "file_list = os.listdir(new_path)  # 获取new_path文件夹中的文件名列表\n",
    "print(file_list)  # ['600000.csv', '600004.csv', '600006.csv',...'沪市股票top300.csv']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://data.educoder.net/api/attachments/3464254?type=office&disposition=inline\" target=\"_blank\">5.7 score.csv</a>  \n",
    "<a href=\"https://data.educoder.net/api/attachments/3464255?type=office&disposition=inline\" target=\"_blank\">成绩分析综合.csv</a>  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "data = np.genfromtxt('/data/bigfiles/成绩分析综合.csv',delimiter=',',encoding='utf-8',dtype=str)\n",
    "# 数据的标题行中有#，其后面的内容默认会被当成注释而忽略掉，导致标题行与数据行的列数不同而出错，可以用comments标明注释符\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "data = np.genfromtxt('/data/bigfiles/5,7 score.csv',delimiter=',',encoding='utf-8',dtype=str,comments=None)\n",
    "# 数据的标题行中有#，其后面的内容默认会被当成注释而忽略掉，导致标题行与数据行的列数不同而出错，可以用comments标明注释符\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
